Model Selection

Video Question Answering

# Video Question Answering

Llava Video 7B Qwen2 TPO

LLaVA-Video-7B-Qwen2-TPO is a video understanding model based on LLaVA-Video-7B-Qwen2 with temporal preference optimization, demonstrating excellent performance across multiple benchmarks.

Mplug Owl3 1B 241014

mPLUG-Owl3 is an advanced multimodal large language model focused on addressing the challenges of long image sequence understanding, significantly improving processing speed and sequence length through the Hyper Attention mechanism.

Safetensors English

Videochat2 HD Stage4 Mistral 7B Hf

VideoChat2-HD-hf is a multimodal video understanding model based on Mistral-7B, focusing on video-to-text conversion tasks.

Tarsier-7b is an open-source large-scale video-language model from the Tarsier series, specializing in generating high-quality video descriptions with excellent general video understanding capabilities.

Cogvlm2 Video Llama3 Chat

CogVLM2-Video is a high-performance video understanding model that achieves state-of-the-art performance in multiple video question-answering tasks, capable of completing video understanding within one minute.

Transformers English

Llava NeXT Video 7B DPO Hf

LLaVA-NeXT-Video is an open-source multimodal chatbot optimized through mixed training on video and image data, possessing excellent video understanding capabilities.

Transformers English

Llava NeXT Video 7B Hf

LLaVA-NeXT-Video is an open-source multimodal chatbot that achieves excellent video understanding capabilities through mixed training on video and image data, reaching SOTA level among open-source models on the VideoMME benchmark.

Transformers English

Git Large Msrvtt Qa

GIT is a dual-condition Transformer decoder based on CLIP image tokens and text tokens, specifically fine-tuned for the MSRVTT-QA task.

Transformers Supports Multiple Languages

Git Base Msrvtt Qa

GIT is a Transformer decoder based on CLIP image tokens and text tokens for vision-language tasks.

Transformers Supports Multiple Languages

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase